Cross Likelihood Ratio Based Speaker Clustering Using Eigenvoice Models

نویسندگان

David Wang

Robbie Vogt

Sridha Sridharan

David Dean

چکیده

This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker diarization using normalized cross likelihood ratio

In this paper, we present the Normalized Cross Likelihood Ratio (NCLR) and the advantages of using it in a speaker diarization system. First, the NCLR is used as a dissimilarity measure between two Gaussian speaker models in the speaker change detection step and its contribution to the performance of speaker change detection is compared with those of BIC and Hostelling’s T-Statistic measures. T...

متن کامل

Robust Speaker Clustering in Eigenspace

In this paper we propose a speaker clustering scheme working in ’Eigenspace’. Speaker models are transformed to a low-dimensional subspace using ’Eigenvoices’. For the speaker clustering procedure simple distance measures, e.g. Euklidean distance can be applied. Moreover, clustering can be accomplished with base models (for Eigenvoice projection) like Gaussian Mixture Models as well as conventi...

متن کامل

Improvement of eigenvoice-based speaker adaptation by parameter space clustering

The segmental eigenvoice method has been proposed to provide rapid speaker adaptation with limited amounts of adaptation data. In this method, the speaker-vector space is clustered to several subspaces and PCA is applied to each of the resulting subspaces. In this paper, we propose two new techniques to improve the performance of this segmental eigenvoice approach. First, we propose a soft-clus...

متن کامل

Emotional transplant in statistical speech synthesis based on emotion additive model

This paper proposes a novel method to transplant emotions to a new speaker in statistical speech synthesis based on an emotion additive model (EAM), which represents the differences between emotional and neutral voices. This method trains EAM using neutral and emotional speech data of multiple speakers and applies it to a neutral voice model of a new speaker (target). There is some degradation ...

متن کامل

T-test distance and clustering criterion for speaker diarization

In this paper, we present an application of student’s t-test to measure the similarity between two speaker models. The measure is evaluated by comparing with other distance metrics: the Generalized Likelihood Ratio, the Cross Likelihood Ratio and the Normalized Cross Likelihood Ratio in speaker detection task. We also propose an objective criterion for speaker clustering. The criterion deduces ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Cross Likelihood Ratio Based Speaker Clustering Using Eigenvoice Models

نویسندگان

چکیده

منابع مشابه

Speaker diarization using normalized cross likelihood ratio

Robust Speaker Clustering in Eigenspace

Improvement of eigenvoice-based speaker adaptation by parameter space clustering

Emotional transplant in statistical speech synthesis based on emotion additive model

T-test distance and clustering criterion for speaker diarization

عنوان ژورنال:

اشتراک گذاری